Biobío Region
ASTROCO: Self-Supervised Conformer-Style Transformers for Light-Curve Embeddings
Tan, Antony, Protopapas, Pavlos, Cádiz-Leyton, Martina, Cabrera-Vives, Guillermo, Donoso-Oliva, Cristobal, Becker, Ignacio
We present AstroCo, a Conformer-style encoder for irregular stellar light curves. By combining attention with depthwise convolutions and gating, AstroCo captures both global dependencies and local features. On MACHO R-band, AstroCo outperforms Astromer v1 and v2, yielding 70 percent and 61 percent lower error respectively and a relative macro-F1 gain of about 7 percent, while producing embeddings that transfer effectively to few-shot classification. These results highlight AstroCo's potential as a strong and label-efficient foundation for time-domain astronomy.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.05)
- South America > Chile > Biobío Region > Concepción Province > Concepción (0.05)
- North America > United States (0.05)
LLM-Based Instance-Driven Heuristic Bias In the Context of a Biased Random Key Genetic Algorithm
Sartori, Camilo Chacón, Pino, Martín Isla, Pinacho-Davidson, Pedro, Blum, Christian
Integrating Large Language Models (LLMs) within metaheuristics opens a novel path for solving complex combinatorial optimization problems. While most existing approaches leverage LLMs for code generation to create or refine specific heuristics, they often overlook the structural properties of individual problem instances. In this work, we introduce a novel framework that integrates LLMs with a Biased Random-Key Genetic Algorithm (BRKGA) to solve the NP-hard Longest Run Subsequence problem. Our approach extends the instance-driven heuristic bias paradigm by introducing a human-LLM collaborative process to co-design and implement a set of computationally efficient metrics. The LLM analyzes these instance-specific metrics to generate a tailored heuristic bias, which steers the BRKGA toward promising areas of the search space. We conduct a comprehensive experimental evaluation, including rigorous statistical tests, convergence and behavioral analyses, and targeted ablation studies, comparing our method against a standard BRKGA baseline across 1,050 generated instances of varying complexity. Results show that our top-performing hybrid, BRKGA+Llama-4-Maverick, achieves statistically significant improvements over the baseline, particularly on the most complex instances. Our findings confirm that leveraging an LLM to produce an a priori, instance-driven heuristic bias is a valuable approach for enhancing metaheuristics in complex optimization domains.
- South America > Chile > Biobío Region > Concepción Province > Concepción (0.04)
- North America > Montserrat (0.04)
- Europe > Switzerland (0.04)
- (3 more...)
A Biased Random Key Genetic Algorithm for Solving the Longest Run Subsequence Problem
Blum, Christian, Pinacho-Davidson, Pedro
The longest run subsequence (LRS) problem is an NP-hard combinatorial optimization problem belonging to the class of subsequence problems from bioinformatics. In particular, the problem plays a role in genome reassembly. In this paper, we present a solution to the LRS problem using a Biased Random Key Genetic Algorithm (BRKGA). Our approach places particular focus on the computational efficiency of evaluating individuals, which involves converting vectors of gray values into valid solutions to the problem. For comparison purposes, a Max-Min Ant System is developed and implemented. This is in addition to the application of the integer linear programming solver CPLEX for solving all considered problem instances. The computation results show that the proposed BRKGA is currently a state-of-the-art technique for the LRS problem. Nevertheless, the results also show that there is room for improvement, especially in the context of input strings based on large alphabet sizes.
- Europe > Germany (0.04)
- South America > Chile > Biobío Region > Concepción Province > Concepción (0.04)
- Europe > Spain (0.04)
- Research Report > New Finding (0.34)
- Research Report > Promising Solution (0.34)
The TUB Sign Language Corpus Collection
Avramidis, Eleftherios, Czehmann, Vera, Deckert, Fabian, Hufe, Lorenz, Lipski, Aljoscha, Villalobos, Yuni Amaloa Quintero, Rhee, Tae Kwon, Shi, Mengqian, Stölting, Lennart, Nunnari, Fabrizio, Möller, Sebastian
We present a collection of parallel corpora of 12 sign languages in video format, together with subtitles in the dominant spoken languages of the corresponding countries. The entire collection includes more than 1,300 hours in 4,381 video files, accompanied by 1,3~M subtitles containing 14~M tokens. Most notably, it includes the first consistent parallel corpora for 8 Latin American sign languages, whereas the size of the German Sign Language corpora is ten times the size of the previously available corpora. The collection was created by collecting and processing videos of multiple sign languages from various online sources, mainly broadcast material of news shows, governmental bodies and educational channels. The preparation involved several stages, including data collection, informing the content creators and seeking usage approvals, scraping, and cropping. The paper provides statistics on the collection and an overview of the methods used to collect the data.
- North America > Mexico (0.14)
- Europe > Germany > Berlin (0.07)
- South America > Colombia (0.05)
- (18 more...)
The Geometry of Meaning: Perfect Spacetime Representations of Hierarchical Structures
Anabalon, Andres, Garces, Hugo, Oliva, Julio, Cifuentes, Jose
We show that there is a fast algorithm that embeds hierarchical structures in three-dimensional Minkowski spacetime. The correlation of data ends up purely encoded in the causal structure. Our model relies solely on oriented token pairs -- local hierarchical signals -- with no access to global symbolic structure. We apply our method to the corpus of \textit{WordNet}. We provide a perfect embedding of the mammal sub-tree including ambiguities (more than one hierarchy per node) in such a way that the hierarchical structures get completely codified in the geometry and exactly reproduce the ground-truth. We extend this to a perfect embedding of the maximal unambiguous subset of the \textit{WordNet} with 82{,}115 noun tokens and a single hierarchy per token. We introduce a novel retrieval mechanism in which causality, not distance, governs hierarchical access. Our results seem to indicate that all discrete data has a perfect geometrical representation that is three-dimensional. The resulting embeddings are nearly conformally invariant, indicating deep connections with general relativity and field theory. These results suggest that concepts, categories, and their interrelations, namely hierarchical meaning itself, is geometric.
A Proposal for Evaluating the Operational Risk for ChatBots based on Large Language Models
Pinacho-Davidson, Pedro, Gutierrez, Fernando, Zapata, Pablo, Vergara, Rodolfo, Aqueveque, Pablo
The emergence of Generative AI (Gen AI) and Large Language Models (LLMs) has enabled more advanced chatbots capable of human-like interactions. However, these conversational agents introduce a broader set of operational risks that extend beyond traditional cybersecurity considerations. In this work, we propose a novel, instrumented risk-assessment metric that simultaneously evaluates potential threats to three key stakeholders: the service-providing organization, end users, and third parties. Our approach incorporates the technical complexity required to induce erroneous behaviors in the chatbot--ranging from non-induced failures to advanced prompt-injection attacks--as well as contextual factors such as the target industry, user age range, and vulnerability severity. To validate our metric, we leverage Garak, an open-source framework for LLM vulnerability testing. We further enhance Garak to capture a variety of threat vectors (e.g., misinformation, code hallucinations, social engineering, and malicious code generation). Our methodology is demonstrated in a scenario involving chatbots that employ retrieval-augmented generation (RAG), showing how the aggregated risk scores guide both short-term mitigation and longer-term improvements in model design and deployment. The results underscore the importance of multi-dimensional risk assessments in operationalizing secure, reliable AI-driven conversational systems.
- North America > United States (0.14)
- South America > Chile > Biobío Region > Concepción Province > Concepción (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)
Astromer 2
Donoso-Oliva, Cristobal, Becker, Ignacio, Protopapas, Pavlos, Cabrera-Vives, Guillermo, Cádiz-Leyton, Martina, Moreno-Cartagena, Daniel
Foundational models have emerged as a powerful paradigm in deep learning field, leveraging their capacity to learn robust representations from large-scale datasets and effectively to diverse downstream applications such as classification. In this paper, we present Astromer 2 a foundational model specifically designed for extracting light curve embeddings. We introduce Astromer 2 as an enhanced iteration of our self-supervised model for light curve analysis. This paper highlights the advantages of its pre-trained embeddings, compares its performance with that of its predecessor, Astromer 1, and provides a detailed empirical analysis of its capabilities, offering deeper insights into the model's representations. Astromer 2 is pretrained on 1.5 million single-band light curves from the MACHO survey using a self-supervised learning task that predicts randomly masked observations within sequences. Fine-tuning on a smaller labeled dataset allows us to assess its performance in classification tasks. The quality of the embeddings is measured by the F1 score of an MLP classifier trained on Astromer-generated embeddings. Our results demonstrate that Astromer 2 significantly outperforms Astromer 1 across all evaluated scenarios, including limited datasets of 20, 100, and 500 samples per class. The use of weighted per-sample embeddings, which integrate intermediate representations from Astromer's attention blocks, is particularly impactful. Notably, Astromer 2 achieves a 15% improvement in F1 score on the ATLAS dataset compared to prior models, showcasing robust generalization to new datasets. This enhanced performance, especially with minimal labeled data, underscores the potential of Astromer 2 for more efficient and scalable light curve analysis.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- South America > Chile > Biobío Region > Concepción Province > Concepción (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (2 more...)
Enhancing Predictive Maintenance in Mining Mobile Machinery through a TinyML-enabled Hierarchical Inference Network
de la Fuente, Raúl, Radrigan, Luciano, Morales, Anibal S
Mining machinery operating in variable environments faces high wear and unpredictable stress, challenging Predictive Maintenance (PdM). This paper introduces the Edge Sensor Network for Predictive Maintenance (ESN-PdM), a hierarchical inference framework across edge devices, gateways, and cloud services for real-time condition monitoring. The system dynamically adjusts inference locations--on-device, on-gateway, or on-cloud--based on trade-offs among accuracy, latency, and battery life, leveraging Tiny Machine Learning (TinyML) techniques for model optimization on resource-constrained devices. Performance evaluations showed that on-sensor and on-gateway inference modes achieved over 90\% classification accuracy, while cloud-based inference reached 99\%. On-sensor inference reduced power consumption by approximately 44\%, enabling up to 104 hours of operation. Latency was lowest for on-device inference (3.33 ms), increasing when offloading to the gateway (146.67 ms) or cloud (641.71 ms). The ESN-PdM framework provides a scalable, adaptive solution for reliable anomaly detection and PdM, crucial for maintaining machinery uptime in remote environments. By balancing accuracy, latency, and energy consumption, this approach advances PdM frameworks for industrial applications.
- South America > Chile > Biobío Region > Concepción Province > Concepción (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Asia > Middle East > Yemen > Amran Governorate > Amran (0.04)
- (3 more...)
- Materials > Metals & Mining (1.00)
- Information Technology > Services (1.00)
- Energy (1.00)
Analyzing Diversity in Healthcare LLM Research: A Scientometric Perspective
Restrepo, David, Wu, Chenwei, Vásquez-Venegas, Constanza, Matos, João, Gallifant, Jack, Filipe, Luis
The deployment of large language models (LLMs) in healthcare has demonstrated substantial potential for enhancing clinical decision-making, administrative efficiency, and patient outcomes. However, the underrepresentation of diverse groups in the development and application of these models can perpetuate biases, leading to inequitable healthcare delivery. This paper presents a comprehensive scientometric analysis of LLM research for healthcare, including data from January 1, 2021, to June 16, 2024. By analyzing metadata from PubMed and Dimensions, including author affiliations, countries, and funding sources, we assess the diversity of contributors to LLM research. Our findings highlight significant gender and geographic disparities, with a predominance of male authors and contributions primarily from high-income countries (HICs). We introduce a novel journal diversity index based on Gini impurity to measure the inclusiveness of scientific publications. Our results underscore the necessity for greater representation in order to ensure the equitable application of LLMs in healthcare. We propose actionable strategies to enhance diversity and inclusivity in artificial intelligence research, with the ultimate goal of fostering a more inclusive and equitable future in healthcare innovation.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
- Asia (0.06)
- (6 more...)
- Health & Medicine > Health Care Technology (0.47)
- Health & Medicine > Diagnostic Medicine > Imaging (0.47)
- Health & Medicine > Therapeutic Area (0.46)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
An economically-consistent discrete choice model with flexible utility specification based on artificial neural networks
Hernandez, Jose Ignacio, Mouter, Niek, van Cranenburgh, Sander
Random utility maximisation (RUM) models are one of the cornerstones of discrete choice modelling. However, specifying the utility function of RUM models is not straightforward and has a considerable impact on the resulting interpretable outcomes and welfare measures. In this paper, we propose a new discrete choice model based on artificial neural networks (ANNs) named "Alternative-Specific and Shared weights Neural Network (ASS-NN)", which provides a further balance between flexible utility approximation from the data and consistency with two assumptions: RUM theory and fungibility of money (i.e., "one euro is one euro"). Therefore, the ASS-NN can derive economically-consistent outcomes, such as marginal utilities or willingness to pay, without explicitly specifying the utility functional form. Using a Monte Carlo experiment and empirical data from the Swissmetro dataset, we show that ASS-NN outperforms (in terms of goodness of fit) conventional multinomial logit (MNL) models under different utility specifications. Furthermore, we show how the ASS-NN is used to derive marginal utilities and willingness to pay measures.
- Europe > Netherlands > South Holland > Delft (0.04)
- South America > Chile > Biobío Region > Concepción Province > Concepción (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- (2 more...)